A General-Purpose Compression Scheme for Databases

نویسندگان

  • Adam Cannane
  • Hugh E. Williams
  • Justin Zobel
چکیده

Current adaptive compression schemes such as gzip and compress are impractical for database compression as they do not allow random-access to individual records. The sequitur scheme of Nevill-Manning and Witten also adaptively compresses data, achieving excellent compression but with signiicant main-memory requirements. A preliminary version of sequitur used a semi-static modeling approach to achieve slightly worse compression than the adaptive approach. We describe a new variant of the semi-static sequitur algorithm, ray, that reduces main-memory use and is a candidate for general-purpose compression and random-access to databases. We show that ray achieves better compression than an eecient Huumann scheme and popular adaptive compression techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Numerical Simulation of Shock-Wave/Boundary/Layer Interactions in a Hypersonic Compression Corner Flow

Numerical results are presented for the shock-boundary layer interactions in a hypersonic flow over a sharp leading edge compression corner. In this study, a second- order Godunov type scheme based on solving a Generalized Riemann Problem (GRP) at each cell interface is used to solve thin shear layer approximation of laminar Navier-Stokes (N-S) equations. The calculated flow-field shows general...

متن کامل

A General Compression Scheme for Databases

Compression of databases not only achieves a reduction in storage space but can reduce overall retrieval times. Current schemes such as gzip and compress are impractical for the purposes of databases as they do not allow individual records to be retrieved. A recent compression scheme, sequitur, allows quick decompression of any individual section of the database, however it uses extravagant amo...

متن کامل

Compression of nucleotide databases for fast searching

MOTIVATION International sequencing efforts are creating huge nucleotide databases, which are used in searching applications to locate sequences homologous to a query sequence. In such applications, it is desirable that databases are stored compactly, that sequences can be accessed independently of the order in which they were stored, and that data can be rapidly retrieved from secondary storag...

متن کامل

Investigations on Path Indexing for Graph Databases

Graph databases have become an increasingly popular choice for the management of the massive network data sets arising in many contemporary applications. We investigate the effectiveness of path indexing for accelerating query processing in graph database systems, using as an exemplar the widely used open-source Neo4j graph database. We present a novel path index design which supports efficient...

متن کامل

Efficient Access of Compressed Data

In this paper a compression technique is presented which allows a high degree of compression but requires only logarithmic access time. Tne tech­ nique is a constant suppression scheme, and is most applicable to stable databases whose distribution of constants is fairly clustered. Further­ more, the repeated use of the technique permits the suppression of a multi­ ple number of different consta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999